Learning to simulate high energy particle collisions from unlabeled data

In many scientific fields which rely on statistical inference, simulations are often used to map from theoretical models to experimental data, allowing scientists to test model predictions against experimental results. Experimental data is often reconstructed from indirect measurements causing the aggregate transformation from theoretical models to experimental data to be poorly-described analytically. Instead, numerical simulations are used at great computational cost. We introduce Optimal-Transport-based Unfolding and Simulation (OTUS), a fast simulator based on unsupervised machine-learning that is capable of predicting experimental data from theoretical models. Without the aid of current simulation information, OTUS trains a probabilistic autoencoder to transform directly between theoretical models and experimental data. Identifying the probabilistic autoencoder’s latent space with the space of theoretical models causes the decoder network to become a fast, predictive simulator with the potential to replace current, computationally-costly simulators. Here, we provide proof-of-principle results on two particle physics examples, Z-boson and top-quark decays, but stress that OTUS can be widely applied to other fields.

x vsx

Supplementary Ablation Study
In this section we show the results of an ablation study to demonstrate the effect of the various hyperparameters. As seen in our final loss function Equation (11),the main hyperparameters of our approach are the λ coefficient in front of the latent space loss, as well as the β E and β D coefficients weighing the anchor losses for the encoder and decoder, respectively. For the semileptonic tt study the only hyperparameter is λ , as the anchor loss is redundant with the choice of a ResNet [1] architecture (see Section 6.2.3). We performed ablations by retraining the models as in Section 6.3.3 but with different values of the hyperparameters on a grid, and comparing the results on validation data.
For studying the effect of λ , we reran both the Z → e + e − and the semileptonic tt studies with λ in {0.001, 0.01, 0.1, 1, 10, 100, 1000}, while keeping all other hyperparamters unchanged (specifically, in the Z → e + e − study we kept β E = β D = 50). For the effect of the anchor loss coefficients, we always assume that β E = β D and define a shared hyperparameter β := β E = β D . We reran the Z → e + e − study with β in {0, 10, 20, 50, 100, 200}, while keeping λ = 1 as in the original experiment. We did not repeat this for the semileptonic tt study as it did not use an anchor loss.
We first consider how the hyperparameters on the anchor loss terms, β E = β D , affect performance. The anchor losses are direct constraints on the learned encoding and decoding mappings which are based on physical concerns. Namely, the anchor loss penalizes networks which would map electron/positron (e ∓ ) information in Z to positron/electron (e ± ) information in X, and vice versa. We impose this constraint because we know that misidentification of charge in the process of data reconstruction is extremely rare in particle experiments. Therefore, for our simulation to be physical, it should not make these unphysical inversions. Unsurprisingly, without this constraint we can see that these inversions can occur during training (see Supplementary Figure 2). On the other hand, if the values of β E = β D are too high we observe unphysical behavior. This is likely due to the fact that the anchor loss is only a proxy for enforcing charge conservation.
We next consider the hyperparameter λ which is present in both case studies. The behavior of λ has theoretical motivations. The WAE method aims to minimize W c (p(x), p D (x)) by converting its calculation into a constrained optimization problem. It was shown [2] that W c (p(x), p D (x)) = inf p E (z|x):p E (z)=p(z) E[c(X, D(Z))] for a deterministic decoder p D (x|z) = δ D(z) (x), 1 . Namely, we need to minimize a reconstruction error over all probabilistic encoders, p E (z|x), satisfying the latent-space matching condition, p(z) ! = p E (z) =: x p E (z|x)p(x)dx. To make the constrained optimization computationally tractable, the WAE method only softly enforces this constraint via a penalty term λ d z (p(z), p E (z)), and considers minimizing the surrogate penalty loss E p(x)p E (z|x)p D (x|z) [c(x,x)] + λ d z (p(z), p E (z)) instead.
By standard results on penalty methods [3], for a fixed decoder, p D (x|z), globally minimizing the penalty loss with respect to the encoder p E (z|x) results in a lower bound on W c (p(x), p D (x)), and solving a sequence of such penalized problems while annealing λ towards infinity results in the exact W c (p(x), p D (x)). However, when training a WAE, it is expensive to repeat this inner optimization procedure after every decoder update, so in practice both the encoder and decoder are optimized jointly on a penalty loss, keeping λ fixed throughout the entire training [2].
While the theoretical guarantees of the penalty method no longer applies to the joint Stochastic Gradient Descent training procedure used in practice, it does suggest that λ should be set to be as large as possible (and perhaps annealed during training) to better enforce the latent space matching, and consequently offer a better approximation of the ideal objective W c (p(x), p D (x)). Indeed, recently it was proven [4] that perfect latent space matching p E (z) == p(z) is a necessary condition for W (p(x), p D (x)) = 0.
Overall, our ablation experiments confirmed this notion and showed that when λ is too small and thus the penalty on latent space matching too week, neither the encoder or the decoder's marginal distribution (p E (z), p D (x)) could capture the ground truth p(z) or p(x) well, despite minimal reconstruction error. We see this behavior in both test cases, however we note that in the semileptonic tt the behavior is somewhat less dramatic because of the heavy initial bias towards an identity mapping due to the ResNet [1] architecture (see Supplementary Figure 3).